Inference with Correlated Clusters
نویسنده
چکیده
This paper introduces a method which permits valid inference given a finite number of heterogeneous, correlated clusters. It is common in empirical analysis to use inference methods which assume that each unit is independent. Panel data permit this assumption to be relaxed as it is possible to estimate the correlations across clusters and isolate the independent variation in each cluster for proper inference. Clusters may be correlated for many reasons such as geographic proximity, similar institutions, comparable industry compositions, etc. Moreover, with panel data, it is typical to include time fixed effects, which mechanically induce correlations across clusters. The introduced inference procedure uses a Wald statistic and simulates the distribution of this statistic in a manner that is valid even for a small number of clusters. To account for correlations across clusters, the relationship between each cluster is estimated and only the independent component of each cluster is used. The method is simple to use and only requires one estimation of the model. It can be employed for linear and nonlinear estimators. I present several sets of simulations and show that the inference procedure consistently rejects at the appropriate rate, even in the presence of highly-correlated clusters in which traditional inference methods severely overreject.
منابع مشابه
A Practitioner's Guide to Cluster-Robust Inference
We consider statistical inference for regression when data are grouped into clusters, with regression model errors independent across clusters but correlated within clusters. Examples include data on individuals with clustering on village or region or other category such as industry, and state-year di erences-in-di erences studies with clustering on state. In such settings default standard erro...
متن کاملNew Approach for Customer Clustering by Integrating the LRFM Model and Fuzzy Inference System
This study aimed at providing a systematic method to analyze the characteristics of customers’ purchasing behavior in order to improve the performance of customer relationship management system. For this purpose, the improved model of LRFM (including Length, Recency, Frequency, and Monetary indices) was utilized which is now a more common model than the basic RFM model apt for analyzing the cus...
متن کاملRobust Inference with Clustered Data
In this paper we survey methods to control for regression model error that is correlated within groups or clusters, but is uncorrelated across groups or clusters. Then failure to control for the clustering can lead to understatement of standard errors and overstatement of statistical signi cance, as emphasized most notably in empirical studies by Moulton (1990) and Bertrand, Du o and Mullainath...
متن کاملRobust Inference with Clustered Data A
In this paper we survey methods to control for regression model error that is correlated within groups or clusters, but is uncorrelated across groups or clusters. Then failure to control for the clustering can lead to understatement of standard errors and overstatement of statistical signi cance, as emphasized most notably in empirical studies by Moulton (1990) and Bertrand, Du o and Mullainath...
متن کاملA Sequential Rejection Testing Method for High-Dimensional Regression with Correlated Variables.
We propose a general, modular method for significance testing of groups (or clusters) of variables in a high-dimensional linear model. In presence of high correlations among the covariables, due to serious problems of identifiability, it is indispensable to focus on detecting groups of variables rather than singletons. We propose an inference method which allows to build in hierarchical structu...
متن کامل